Reproducing Russian NER Baseline Quality without Additional Data
نویسندگان
چکیده
Baseline solutions for the named entity recognition task in Russian language were published a few years ago. These solutions rely heavily on the addition data, like databases, and different kinds of preprocessing. Here we demonstrate that it is possible to reproduce the quality of existing database-based solution by character-aware neural net trained on corpus itself only.
منابع مشابه
Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition
Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of appro...
متن کاملAnalysis of Multiword Expression Translation Errors in Statistical Machine Translation
In this paper, we analyse the usage of multiword expressions (MWE) in Statistical Machine Translation (SMT). We exploit the Moses SMT toolkit to train models for French-English and Czech-Russian language pairs. For each language pair, two models were built: a baseline model without additional MWE data and the model enhanced with information on MWE. For the French-English pair, we tried three me...
متن کاملCombining Named Entity Recognition Methods for Concept Extraction in Microposts
NER in microposts is a key and challenging task of mining semantics from social media. Our evaluation of a number of popular NE recognizers over a micropost dataset has shown a significant drop-off in results quality. Current state-of-theart NER methods perform much better on formal text than on microposts. However, the experiment provided us with an interesting observation – although individua...
متن کاملOmnifluent English-to-French and Russian-to-English Systems for the 2013 Workshop on Statistical Machine Translation
This paper describes OmnifluentTM Translate – a state-of-the-art hybrid MT system capable of high-quality, high-speed translations of text and speech. The system participated in the English-to-French and Russian-to-English WMT evaluation tasks with competitive results. The features which contributed the most to high translation quality were training data sub-sampling methods, document-specific ...
متن کاملOmnifluentTM English-to-French and Russian-to-English Systems for the 2013 Workshop on Statistical Machine Translation
This paper describes OmnifluentTM Translate – a state-of-the-art hybrid MT system capable of high-quality, high-speed translations of text and speech. The system participated in the English-to-French and Russian-to-English WMT evaluation tasks with competitive results. The features which contributed the most to high translation quality were training data sub-sampling methods, document-specific ...
متن کامل